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I. INTRODUCTION 

Flavor-changing neutral-current transitions such as 
b — » svv are absent at tree level in the Standard Model 
(SM) and occur only via electroweak penguin diagrams 
or one-loop box diagrams with virtual heavy particles 
in the loops, as shown in Fig. [TJ Because such loop 
production processes are generally suppressed, the SM 
predicts b — * svv transitions to be very rare. We report 
herein the results of a search for the exclusive decay mode 
B+ — > K+vv (charge conjugation is implied throughout 
this paper). The SM prediction of the branching fraction 
is B(B+ -> K+vv) = (3.8±J;|) x 10" 6 and the most 
stringent published upper limit at 90% confidence level is 
B(B+ -> K+vv) < 1.4x 10~ 5 [2] by the Belle Collabora- 
tion with 535 x 10 6 BB pairs. This analysis serves as an 
independent measurement of the limit on this branching 
fraction. 

With current luminosity, we do not have the sensitivity 
to measure a branching fraction at the level of the SM; 
however several new physics models may increase the rate 
of b — ► svv transitions. The Minimal Supersymmetric 
Standard Model with large tan j3 [H leads to higher rates 
through chargino and/or charged Higgs contributions to 
the loop diagram. "Unparticle models" |4j can give an 
observed enhancement because of the similarities in de- 
cay signatures between a B+ — > K+vv decay and a decay 
containing an unparticle. Models with a single universal 
extra dimension [5|] enhance the observed rate at lower 
values of 1/R, where R is the compactification radius of 
the extra dimension. Also, models with light scalar dark 
matter candidates having GeV/c 2 or sub- GeV/c 2 masses 
6] may increase the observed rate because of the simi- 
larity in decay signatures between B+ — > K+vv and a 
decay containing two dark matter scalars. 



W~ 



u,c,t 
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FIG. 1: The b — > svv transition proceeding through a penguin 
diagram (top) and a box diagram (bottom). 



Due to the presence of multiple neutrinos, the B+ — > 
K+vv decay mode lacks the kinematic constraints that 
are usually exploited in B decay searches at B factories to 
reject both continuum (non-BB) and BB backgrounds. 
The strategy adopted for this analysis is to reconstruct 
an exclusive decay of the B~ meson in the event, the 
"tag B," in one of several semileptonic decay modes. All 
remaining charged and neutral particles in the event are 
examined under the assumption that they are products of 
the accompanying B decay, the "signal B." We perform a 
multivariate analysis using a random forest classifier (ex- 
plained below) to separate signal events from background 
events. We keep the signal region of the classifier out- 
put blind to avoid experimenter bias. The random forest 
classifier introduces very different systematic uncertain- 
ties than this collaboration's previous measurement [7(. 



II. THE BABAR DETECTOR AND DATASET 

The data used in this analysis were collected with the 
BABAR detector [8J at the PEP-H asymmetric e+e~ stor- 
age ring. The sample corresponds to an integrated lu- 
minosity of 319fb _1 at the T(4S) resonance, and con- 
sists of about 351 x 10 6 BB pairs. Charged-particle 
tracking and dE/dx measurements for particle identifi- 
cation (PID) are provided by a five-layer double-sided 
silicon vertex tracker (SVT) and a 40-layer drift cham- 
ber (DCH) in a 1.5 T axial magnetic field. A ring imaging 
Cherenkov detector (DIRC) is used for tt — K discrim- 
ination. The energies of neutral particles are measured 
by an electromagnetic calorimeter (EMC) consisting of 
6580 CsI(Tl) crystals. The magnetic flux return of the 
solenoid, instrumented with resistive plate chambers and 
limited streamer tubes, provides muon identification. 

A GEANT4-based [9| Monte Carlo (MC) simulation is 
used to model the BABAR detector response, taking into 
account the varying accelerator and detector conditions. 
Dedicated signal and background MC samples are used 
to estimate the signal selection efficiency and determine 
the expected number of background events. Simulation 
samples are used to model BB events and continuum 
e+ e~ — > uu, dd, ss, cc, and t+t~ events (background MC 
events). A sample of 1.37 x 10 6 events is simulated in 
which the B+ meson decays to K+ vv, and the B~ meson 
decays to a mode with at least one lepton in the final state 
(signal MC events). 



III. ANALYSIS METHOD 

A. Tag B Reconstruction 

The tag B reconstruction combines a D° meson with 
a single identified charged lepton to form a D°£ candi- 
date. The lepton candidate must have PID information 
consistent with an electron or a muon, have a minimum 
transverse momentum of 0.1 GeV/c, and have at least 20 
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hits in the DCH. The D° candidates are reconstructed in 
three decay modes: K~tt + , K~tt + it~i{ + , and K~ir + it® . 
The charged pions from the D° decay must have a polar 
angle between 0.41 and 2.54 radians. The K~ candidate 
must fail 7r~ PID requirements based on the candidate's 
measured dE/dx for lower momentum candidates, or on 
the Cherenkov angle, number of photons, and track qual- 
ity measured in the DIRC for higher momentum candi- 
dates. The 7T candidates are required to have a recon- 
structed mass between 0.115 < m T o < 0.150 GeV/c 2 , and 
have an energy measured in the laboratory frame greater 
than 0.2 GeV. The reconstructed D° mass, mjjo, must 
be within 0.04GeV/c 2 (0.07GeV/c 2 ) of the nominal D° 
mass for the channels without (with) a ir° in the final 
state, and the center of mass momentum, pd°, must be 
greater than 0.5 GeV/c. The invariant mass, m fl o ( , of the 
D°£ candidate must be greater than 3.0 GeV/ c 2 . 

Assuming that a neutrino is the only particle missing 
from a genuine B~ — > D°l~v decay, the cosine of the an- 
gle between the direction of the reconstructed tag B and 
that of the D°l candidate, described by the four vector 
{E D o u p DOl ), is given by 

2E B E D o e -m 2 B - m 2 D0e 
cos9 BD o i = (1) 

2\Pdh\V E B - m B 

where ms is the nominal B meson mass and Eb and 
y/ E B — m g are the expected B meson energy and mo- 
mentum, respectively, fixed by the energies of the beams 
and evaluated in the center of mass frame. We retain 
events in the interval —2.5 < cos0b,d°i <1-1- These 
bounds are outside the allowed physical region to main- 
tain efficiency for B~ — > D*°£~D decays in which a ir° 
or photon has not been reconstructed as part of the D°£ 
combination and to account for resolution effects. If more 
than one D°£ candidate is reconstructed in a given event, 
the one with the smallest | cos Bb,d°A is retained. The 
m D o distribution remains unbiased by this method of 
choosing the best candidate, allowing us to later use it 
for background estimations fSEC. HTl C[) . 

B. Signal Event Selection 

Events containing a reconstructed tag B are examined 
for evidence of a B + — > K + vV decay. We require that the 
number of charged tracks remaining after the tag B has 
been reconstructed is less than 4, and that the missing en- 
ergy in the event is greater than 2.5 GeV. The signal K + 
candidate must satisfy PID criteria, have a polar angle 
between 0.469 and 2.457 radians (the angular acceptance 
of the DIRC), and have a charge opposite that of the lep- 
ton in the tagged B decay. After applying these prelimi- 
nary requirements, the MC samples are reweighted to re- 
produce the selection efficiencies determined on the data. 
We use a bagged decision tree multivariate classifier (ran- 
dom forest classifier [13]), available in the StatPattern- 
Recognition software package [ll[ , to discriminate signal 



events from background. This method, a powerful alter- 
native to "rectangular cuts," separates events of differ- 
ent categories by training many decision trees (sequential 
partitioning of data into subsets of similar characteristics 
starting from a root node) [l2j]. We choose to use a ran- 
dom forest classifier instead of another multivariate clas- 
sifier because of its stability with higher dimensionality 
(more input variables), its training stability (the perfor- 
mance is less likely to diminish with continued training) , 
and its insensitivity to input variables with weak discrim- 
inating power. 

We use the 19 variables listed in Table[T]in the classifier, 
the most important of which are the number of charged 
tracks not used in the tag B reconstruction, the total 
energy of signal side photons each with energy greater 
than 0.05 GeV (lower energy photons are not modeled 
as well in our MC events), the signal-kaon momentum, 
and the missing energy of the event. Distributions of 
these variables (Fig. [2]) show a clear difference between 
signal events and the different types of background. 

To avoid overtraining of the classifier, we separate the 
MC events equally between two samples. The classifier is 
trained on one sample, while the other is used to deter- 
mine if the classifier is overtrained and what the signal 
and background efficiencies are when cutting on the clas- 
sifier output. We train the classifier by optimizing the 
Punzi figure of merit (FOM) [H[ 



0.5 • N„ + V]Vs 

where Ns is the expected number of signal events (as- 
suming the SM branching fraction) , Nb is the number of 
expected background events, and N a is the sigma level of 
significance, taken here to be 3. The output of the ran- 
dom forest classifier is shown in Fig. for MC simulated 
signal and background events. Based on MC predictions, 
the multivariate classifier improves the FOM by 34% over 
rectangular selection requirements. 

We define the signal region by requiring the output 
of the multivariate classifier be greater than 0.82. This 
selection is optimized using signal and background MC 
samples to maximize the FOM given in Eq. O When 
combined with the efficiency of reconstructing the tag B, 
this gives a total signal efficiency of e = (0.16±0.02)%. 
The quoted error is the quadratic sum of the statistical, 
theoretical, and systematic uncertainties; we discuss the 
latter two below. 



C. Background Estimation 

We divide the distribution of the invariant mass of the 
D candidate into a high and low sideband and a signal 
region, defined in Table [TTJ This variable shows no corre- 
lation with the output of the random forest. We identify 
two types of background in our signal region: combina- 
toric backgrounds, which are linear in the m^o distribu- 
tion shown in Fig. [¥J and peaking backgrounds, which 
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FIG. 2: Several important variables that are used in the random forest classifier, (a) The number of charged tracks in the event 
not used for tag B reconstruction, (b) the total energy of photons (E-, > 0.05 GeV) not used in the tag B reconstruction, (c) 
the momentum of the signal B kaon candidate, and (d) the missing energy. These are shown for the signal MC events (bold 
black-outlined histogram) and the different types of background; starting from the bottom of the stacked histogram, these are 
B + B~ (hatched), B°B° (light grey), cc (hatched), and uu, ss, dd (black). The signal MC distribution has been normalized to 
unit area and the background classes have been normalized by an arbitrary constant, but reflect the relative amounts of each 
class as expected in the data. 



correspond to true D candidates and peak in the m^o 
distribution. The combinatoric background level is esti- 
mated from the number of events in the m^o sidebands 
that pass all cuts. The level of combinatoric background 
expected in the signal region is 22 ± 5 events, where the 
uncertainty is statistical. 

To evaluate the peaking backgrounds, we start with the 
random forest output of events in the m^o signal region 
after subtracting the non-peaking part using the m D o 
sidebands. This distribution is produced for both MC 
and data and their ratio is shown in Fig.O We fit a line to 
the points in which the output of the classifier is less than 
0.82 (the background region) and extrapolate it into the 
signal region. The slope of the line is different from and 
yields a multiplicative correction of 1.29 to the peaking 
component of our MC sample in the signal region that 
accounts for discrepancies between data and simulated 
events. The level of peaking background expected in the 



signal region is 9 ± 10 events, where the uncertainty is 
statistical. 



IV. SYSTEMATIC UNCERTAINTY STUDIES 

We vary the functions used to determine the esti- 
mated number of combinatoric and peaking background 
events in order to obtain the associated systematic un- 
certainties. This yields a systematic uncertainty of ±1.9 
events for the combinatoric background estimate and 
±3.2 events for the peaking background estimate. We as- 
sociate an uncertainty with reweighting the MC samples 
after the preliminary requirements (±3.0 events), taken 
to be the difference between the number of events ex- 
pected with and without this reweighting. 

The systematic uncertainty associated with the re- 
quirement on the output of the random forest classifier 
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TABLE I: Descriptions of the variables input to the random 
forest classifier. 



Tag B Variables 



1. Number of charged tracks used to reconstruct the B 

2. Number of n°'s in the D° decay mode 

3. The cosine of the angle between the thrust axis and the 2-axis 
in the center of mass 

4. Total momentum transverse to the 2-axis 

5. Cosine of the angle of the momentum vector of the D°£ 
candidate to the z-axis in the center of mass 

6. Center of mass momentum of the lepton candidate 

7. Cosine of the angle between the tag side momentum of the 
combined D°l and the momentum of the parent B meson 
in the center of mass 



Signal B Variables 



8. Number of charged tracks remaining after reconstruction of 
the tag side 

9. Total energy of photons with E 1 > 50 MeV, after the 
tag side reconstruction 

10. Momentum of the signal side kaon in the center of mass 

11. Number of photons with _E 7 > 50 MeV, remaining after the 
tag side reconstruction 

12. Cosine of the angle between the signal side kaon and the 
tag side lepton candidate in the center of mass 

13. Cosine of the angle between the signal side kaon and the 
tag side D candidate in the center of mass 

14. Cosine of the angle between the signal side kaon and the 
tag side D°l candidate in the center of mass 

15. Number of 7r° candidates remaining after tag side 
reconstruction 



Event Variables 



17. Amount of undetected energy 

18. Amount of undetected mass 

19. 2nd Fox- Wolfram moment [141 



TABLE II: The mode-specific m D o (in units of GeV/c 2 ) side- 
band definitions. The boundaries for the third mode differ 
from those of the first two due to the presence of a 7r°. 



D mode Lower Side Signal Region Upper Side 



Kn, Kmr-ir 1.8245-1.8445 1.8445-1.8845 1.8845-1.9045 
Kirn 1.7945-1.8295 1.8295-1.8995 1.8995-1.9345 
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FIG. 3: The output of the random forest classifier for the sig- 
nal MC events (bold black-outlined histogram) and the dif- 
ferent types of background; starting from the bottom of the 
stacked histogram these are B+B~ (hatched), B°B° (light 
grey), cc (hatched), and uu, ss, dd (black). The signal MC 
distribution has been normalized to unit area and the back- 
ground classes have been normalized by an arbitrary constant, 
but reflect the relative amounts of each class as expected in 
the data. 



is evaluated by selecting a double tag sample in both the 
data and the MC events in which both B mesons de- 
cay semileptonically. Using particle substitution on one 
of the two semileptonically-tagged B mesons (D° — > K, 
l + — > v), we model the distributions of the variables that 
are included in the random forest classifier to resemble 
the signal MC events' distributions. This second tagged 
B meson serves as a control sample to estimate the differ- 
ence in selection efficiency between MC events and data 
of the random forest classifier. This difference between 
the double tag data and MC efficiencies (5.2%) is as- 
signed as the systematic uncertainty for how well the 
MC sample models the data. An additional contribution 
of 9.3% accounts for the difference between our control 
sample (double tag MC events) and our signal MC sam- 
ple. Added in quadrature, the total uncertainty due to 
the random forest selection is 10.7%. 

Additional systematic uncertainties associated with 
the B + — > K + vV signal efficiency include uncertainties 
in the tagging efficiency (6.5%), in the PID criteria used 
to identify the signal kaon (3.5%), and in the tracking 
efficiency (0.5%). We evaluate the tagging uncertainty 
using the double tag sample, with both D°s decaying to 
K~ir + . We take the ratio of the efficiency of finding both 
tags to the efficiency of finding one tag in both the MC 
sample and the data. We then compare these ratios to 
determine the associated systematic uncertainty. 

The theoretical uncertainty on the K + momentum 
spectrum in B + — > K + vv decays results in a 3.1% un- 
certainty on the signal efficiency. This uncertainty is 
evaluated by comparing the efficiency obtained using the 
kaon momentum spectrum given in Ref. [l[ and the ef- 
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FIG. 5: The ratio of the number of MC events to the number 
of data events in the m D o signal region of each bin of the ran- 
dom forest classifier's output after background subtraction. 
The linear fit described in the text is also shown. 



TABLE III: Systematic uncertainties on signal efficiency (in 
%). 



Systematic Uncertainties 



Random Forest Selection 10.7 
Tagging 6.5 
Kaon PID 3.5 
Tracking 0.5 
Kaon Momentum 3.1 



Total 



13.4 



ficiency obtained when the sample is generated using a 
phase space model for the decay. These theoretical and 
systematic uncertainties on the signal efficiency are sum- 
marized in Table UTTl 



FIG. 4: A comparison of the invariant mass distributions of 
the D° candidates for the MC events and the data; the MC 
predictions are the grey rectangles (the height representing 
the uncertainty) and the data are the black points with error 
bars. The boundary between the signal and sideband regions 
is shown by the vertical black lines. Plot (a) is the Kn mode, 
(b) is the Ktytttv mode, and (c) is the Kiriv mode. 



V. RESULTS 

We expect 31 ± 12 background events in the signal re- 
gion, and the SM predicts 2.2 signal events. The signal 
efficiency is (0.16 ± 0.02)%, which, using the Bayesian 
procedure described in [15j |. yields an expected upper 
limit of 3.1xl0 -5 at 90% confidence level. We observe 
38 events in the signal region. The distributions of the 
random forest classifier's output for the data and back- 
ground Monte Carlo events are shown in Fig. [6] Because 
the number of events we see is consistent with the ex- 
pected background, we interpret these results in the con- 
text of the SM, and set an upper limit on the branching 
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fraction at B{B+ -> K+vV) < 4.5 x 1CT 5 at 90% confi- 
dence level. This is an improvement over BABAR's previ- 
ous search for this mode [3], which set an upper limit at 
B(B + — > K + vv) < 5.2 x 1CU 5 using both semileptonic 
and hadronic tagged events, and is consistent with the 
recent, more stringent, limit by the Belle Collaboration 

1- 




0.6 0.8 1 

Random Forest Output 



FIG. 6: The distribution of the random forest classifier's out- 
put for data (crosses). The expected range for the number 
of background events is shown as the grey boxes. The verti- 
cal black bar shows the cut on the random forest classifier's 
output: the signal region is to the right. 
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